DITTO Clean Up and Local Prediction instructions #22

JmScherer · 2025-12-17T15:04:50Z

Please check if the PR fulfills these requirements

Tested as per the documentation and they passed
Docs have been added / updated (for bug fixes / features)

What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)

Updated the .gitignore to not commit DITTO output or work done by NextFlow
Updated the root project's README.md to include new instructions, requirements, document linking, and notes
Consolidated directory structure for various configuration files into .config subfolders per service and updated code to reflect new paths
Added a python=3.10 dependency for the configs/conda/open-cravat.yml conda environment
Included a configs/nextflow/local.config file to allow NextFlow to use Anaconda when running DITTO locally on a device
Consolidated shap_plots directory into docs directory
Updated the pipeline.nf to make the designated output directory along with the parent folders
- Default output folder is $PWD/data/output
Updated the HPC slurm model.job file to output DITTO scores to $PWD/data/output when finished
- This can be overridden to user preference, it just needs to be a full path otherwise NextFlow won't be able to access and save results

What is the current behavior? (You can also link to an open issue here)

DITTO is difficult to run and this should help streamline the process.

What is the new behavior (if this is a feature change)?

This update is intended to make DITTO more user friendly and easier to run.

Does this PR introduce a breaking change?

N/A

To Review:

Static Code Analysis by Reviewer
Clone repo and change to the local-prediction branch
Depending on the environment follow the HPC Prediction with Cheaha or Local Prediction instructions and notes below

HPC Prediction with Cheaha

Follow the README.md for HPC Prediction with Cheaha

In the root DITTO folder, run tail -f DITTO_logs.out to see the output

Local Prediction Notes:

Reviewer will need the local device to have access to all the OpenCravat annotators, this will be roughly 600GB in disk space. Contact PR assignee for an external drive containing the data if you would like to test this route.

It looks like OpenCravat conda environment needs to be created and the module path set before DITTO can run. Even though it's in the NextFlow pipeline to set the path, it doesn't seem to do it.

Create OpenCravat conda environment
- conda create --name opencravat-env
- conda activate opencravat-env
- conda env update -n opencravat-env --file ./configs/conda/open-cravat.yaml
- oc config md /Volumes/my_book/opencravat/modules
- conda deactivate
Follow the README.md for Local Prediction

Local Prediction

.test_data/file_list.txt

Running DITTO locally

DITTO completing locally

HPC Prediction with Cheaha

.test_data/file_list.txt

Running DITTO with HPC

DITTO completing on Cheaha

…dating documentation to reflect new path

…/file_list.txt

…ctions and updated the .gitignore to ignore the work_dir for temporary nextflow runs

wilkb777

Just a minor change to make testing the pipeline better/easier (requiring no file modifications out of the box) and corresponding changes to documentation.

wilkb777 · 2025-12-17T17:30:42Z

pipeline.nf

 workflow {
-
  // Define input channels for the VCF files
  vcfFile = Channel.fromPath(params.sample_sheet).splitCsv(header: false)


Suggested change

vcfFile = Channel.fromPath(params.sample_sheet).splitCsv(header: false)

vcfFile = Channel.fromPath(params.sample_sheet)

.splitText() // Emit each line as a separate item

.map { line ->

// For each line (relative path), create a Nextflow file object relative to params.data_dir

if (line.startsWith("/")){

return line.trim()

} else {

def abs_path = file(workflow.launchDir).resolve(line.trim())

return abs_path

}

}

.map { path_obj ->

// Ensure the path is a proper Path object for staging

file(path_obj, checkIfExists: true)

}

This will enable relative pathing for file paths (relative to the launch directory, which is the directory the pipeline.nf workflow was run from) specified in the input sample sheet text files. This will allow you to delete the bit in the README instructions about having to specify full paths for inputs (which cleans up testing nicely). I've already tested that this works locally and on Cheaha (assuming run sbatch from the repos root directory) so this can be directly added in with no further review.

Thanks for this! This will simplify the process further.

Updated the README.md to reflect this change. Now supports absolute and relative pathing for vcf.gz files in the file list.

wilkb777 · 2025-12-17T18:27:47Z

README.md

+- Update the `.test_data/file_list.txt` (inout vcfs) files with complete file paths and submit a slurm job using the
+command below


Suggested change

- Update the `.test_data/file_list.txt` (inout vcfs) files with complete file paths and submit a slurm job using the

command below

- Create a text file listing the path to VCF file(s) (1 path per line) with variants to score

- Paths can be full absolute paths **or** relative paths (relative to the directory where the pipeline will be run from, **not** the directory where the `pipeline.nf` file is)

- See the example input file [.test_data/file_list.txt](.test_data/file_list.txt) (lists 2 testing example input VCFs)

for reference or as an input file for testing (default behavior of `model.job`)

updated this text to be a bit more explicit and clear on what to do for input (see suggestion on supporting relative pathing for input files in pipeline.nf)

In the README.md, provided real examples of the relative/absolute pathing in addition to this clarification.

README.md

Co-authored-by: Brandon M Wilk <[email protected]>

sdhutchins

My comments are mainly suggestions and not requirements. This is good to go after @wilkb777's suggested changes go in.

README.md

configs/conda/open-cravat.yaml

…ded names to other environments

…ile_list.txt for pathing on vcfs for DITTO; Updated README to discuss the use of Mamba for NextFlow

…throwing errors. We suspect the version it pulled automatically was too new for the pipeline

JmScherer added 15 commits December 12, 2025 13:06

Initial cleaning up the configs folder, organizing by service, and up…

706751c

…dating documentation to reflect new path

Updating the configs folder for the nextflow configurations

23f59d4

moved shap_plots to docs and removed hardcoded path from ./.test_data…

82168db

…/file_list.txt

Update the model.job with some changes and the pipeline.nf

92bc76e

Moved oc ditto package into config/opencravat

0deeada

Updated readme

9f41cc1

Updating the README.md

774e32d

More README.md updates

dbf095c

Markdown linting on README.md

2bf24f3

Hopefully finished README markdown linting

a432c54

Updated README.md to include local.config for local prediction instru…

471d0ba

…ctions and updated the .gitignore to ignore the work_dir for temporary nextflow runs

Worked out an output directory path and updated the documentation

5d89a29

fixing ./text_data/file_list.txt

db5d1ef

Updating model.job output folder

1c5bc9d

updating the folder for output on nextflow command

9b7a691

JmScherer requested review from sdhutchins and wilkb777 December 17, 2025 15:04

JmScherer self-assigned this Dec 17, 2025

JmScherer added documentation Improvements or additions to documentation enhancement New feature or request labels Dec 17, 2025

wilkb777 requested changes Dec 17, 2025

View reviewed changes

JmScherer and others added 4 commits December 18, 2025 14:06

Update pipeline.nf

54ec4ec

Co-authored-by: Brandon M Wilk <[email protected]>

Update README.md

f9d8c03

Co-authored-by: Brandon M Wilk <[email protected]>

Update README.md

305940e

Co-authored-by: Brandon M Wilk <[email protected]>

Update README.md

6cd3306

Co-authored-by: Brandon M Wilk <[email protected]>

sdhutchins approved these changes Dec 19, 2025

View reviewed changes

README.md Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

configs/conda/open-cravat.yaml Show resolved Hide resolved

JmScherer added 4 commits January 6, 2026 14:55

Markdown linting for README

4104446

Added ditto-env.yaml to conda envs, updated README to reflect, and ad…

3122d20

…ded names to other environments

Updated the README to discuss relative and absolute pathing for the f…

b585a5b

…ile_list.txt for pathing on vcfs for DITTO; Updated README to discuss the use of Mamba for NextFlow

markdown linting

8692431

Updated the ditto-nf.yaml conda env to include h5py binary as it was …

b9c4bd1

…throwing errors. We suspect the version it pulled automatically was too new for the pipeline

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DITTO Clean Up and Local Prediction instructions #22

DITTO Clean Up and Local Prediction instructions #22

Uh oh!

JmScherer commented Dec 17, 2025 •

edited by sdhutchins

Loading

Uh oh!

wilkb777 left a comment

Uh oh!

wilkb777 Dec 17, 2025

Uh oh!

JmScherer Dec 18, 2025

Uh oh!

JmScherer Jan 8, 2026

Uh oh!

wilkb777 Dec 17, 2025

Uh oh!

JmScherer Jan 8, 2026

Uh oh!

Uh oh!

Uh oh!

sdhutchins left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

-  vcfFile = Channel.fromPath(params.sample_sheet).splitCsv(header: false)
+  vcfFile = Channel.fromPath(params.sample_sheet)
+    .splitText() // Emit each line as a separate item
+    .map { line ->
+        // For each line (relative path), create a Nextflow file object relative to params.data_dir
+        if (line.startsWith("/")){
+            return line.trim()
+        } else {
+            def abs_path = file(workflow.launchDir).resolve(line.trim())
+            return abs_path
+        }
+    }
+    .map { path_obj ->
+        // Ensure the path is a proper Path object for staging
+        file(path_obj, checkIfExists: true)
+    }

		- Update the `.test_data/file_list.txt` (inout vcfs) files with complete file paths and submit a slurm job using the
		command below

-- Update the `.test_data/file_list.txt` (inout vcfs) files with complete file paths and submit a slurm job using the
-command below
+- Create a text file listing the path to VCF file(s) (1 path per line) with variants to score
+  - Paths can be full absolute paths **or** relative paths (relative to the directory where the pipeline will be run from, **not** the directory where the `pipeline.nf` file is)
+- See the example input file [.test_data/file_list.txt](.test_data/file_list.txt) (lists 2 testing example input VCFs)
+  for reference or as an input file for testing (default behavior of `model.job`)

DITTO Clean Up and Local Prediction instructions #22

Are you sure you want to change the base?

DITTO Clean Up and Local Prediction instructions #22

Uh oh!

Conversation

JmScherer commented Dec 17, 2025 • edited by sdhutchins Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wilkb777 left a comment

Choose a reason for hiding this comment

Uh oh!

wilkb777 Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

JmScherer Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

JmScherer Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

wilkb777 Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

JmScherer Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sdhutchins left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

JmScherer commented Dec 17, 2025 •

edited by sdhutchins

Loading